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Abstract. 

We continue the study of the assignment problem for a random cost matrix. We analyse 
the number of Z:-cycles for the solution and their dependence on the symmetry of the random 
matrix. We observe that for a symmetric matrix one and two-cycles are dominant in the optimal 
solution. In the antisymmetric case the situation is the opposite and the one and two-cycles 
are suppressed. We solve the model for a pure random matrix (without correlations between 
its entries) and give analytic arguments to explain the numerical results in the symmetric and 
antisymmetric case. We show that the results can be explained to great accuracy by a simple 
ansatz that connects the expected number of ^-cycles to that of one and two cycles. 



PACS numbers: 02.60.Pn, 02.70.Rr, 64.60.Cn 

1. Introduction 

The assignment problem (AP) for a given cost or distanc^ matrix (dij),(i,j — l,...,N) 
consists in finding the permutation a G Sn that minimises the total distance J^lj d ja ^y 

There are other problems related to this with additional constraints on the permutations 
allowed. Probably, the most renowned one is the traveling salesman problem (TSP) that can 
be formulated like the previous AP but admitting only cyclic permutations (we insist that 
unlike in the standard TSP our matrix does not need to be a true distance matrix). The list 
includes also the minimum weight simple matching problem (SMP) where only permutations 
composed of two-cycles are allowed (obviously in this case N has to be even) and the, 
somehow opposite case of the minimum weight directed 2-restricted 1 -factor problem (1FP), 
for which one-cycles and two-cycles are forbidden. If the matrix is symmetric the latter 
problem can be also seen as a minimum weight non directed 2-factor problem (2FP). 

From the point of view of complexity theory, it is well known (see lHJ-El) that the TSP 
is NP-hard while the 2FP the AP and the SMP can be solved in a time the scales polynomially 
with N. 

In this paper we are interested in the study of the AP for random cost or distance 
matrices. This problem has been studied for many years, focusing mainly on the minimal 
distance D(AP). For example, for random matrices whose entries have probability density 

% We use the term distance matrix although are not necessarily true distances in a mathematical sense, in particular 
they do not need to be positive or symmetric. 
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p(dij) = exp(—dij)9(dij) (9 is the Heaviside step function), it was first conjectured by G. 
Parisi |5| and then proved rigorously (|6|-[9|) that the expected length is 

ow> = £ ^, (i) 

m=l 

with N the number of points to be matched. Furthermore, for general random distances whose 
densities behave like p (r) = 1 — ar + ff{r 2 ) near r = 0, it is known (| 1 1 - [ 14|) that 

(D(AP)) = C(2) - 2 ( 1 - fl K(3) + l + e[N - 2l (2) 

where C,(x) is the Riemann's zeta function. 

It is also known that for the TSP on symmetric random matrices with p (0) = 1, the mean 
length of the minimal tour is (|Q3],|[16]) D o = Yim N ^°°{D{TSP)) = 2.041..., and the next 1 /N 
corrections are ( |T7l , lfT8l ) 

/ , „ f, °- 1437 10 - 377 \ 

(D(TSP))=Do(l Jj2- + -j- ( 3 ) 

Different probabilistic relations among the problems considered in the previous 
paragraphs are also well known in the literature. Namely, since the seminal work of Karp 
lfl9ll we know that for purely asymmetric random matrices with uniformly distributed entries 
we have 

See also l20l and references therein for more precise estimates of this convergence. 

The case of symmetric random matrices is however different, and in this situation the 
expected length of the solution in the TSP and in the AP do not coincide in the large N limit. 
A different problem that has been shown to be closer to the TSP in probabilistic terms is the, 
above mentioned, 2FP where one-cycles and two-cycles are excluded. In ref. |2T1 it is shown 
that the expected value of the minimal distance for TSP and 2FP with symmetric random 
matrix coincides in the large N limit. These results make clear that the structure of cycles in 
the optimal permutation for the AP depends strongly on the symmetry of the distance matrix 
and gives the clue to compare, at a probabilistic level, the different related problems. 

Actually, in a recent paper [22 1, we found that depending on the characteristics of 
the distance matrix the AP can interpolate between those situations which are near the SM 
problem (in the sense that the optimal permutation is composed approximately ofN/2 cycles) 
and those whose optimal permutation is composed of a few cycles (just one in some cases) 
and one and two cycles are absent. These can be considered near the TSP or 2FP solution. 
The transition between both limits is governed by the correlation of the distances dij and djC 
for positive correlations the AP problem is in the "SM regime", whereas for anti correlated 
distances it is "near" the TSP regime. The transition point is located where there is no 
correlation between the entries djj (that is all the distances are independent random variables), 
a situation that can be solved analytically as we shall see. 

In this paper we shall study the expected number of ^-cycles in the optimal permutation 
and its dependence on the symmetry of the distance matrix. We shall show analytic and 
numerical results with special emphasis in the large N limit. In particular we put into relation 
the probability of a permutation to be the solution of the AP with the number of one-cycles and 
two-cycles it contains. This ansatz can account for the numerical results with high accuracy. 

The paper is organised as follows. In the next section we describe the problem with 
full precision. The numerical results for the expected value of the number of ^-cycles are 
presented in section 3. In the next three sections we give analytic arguments to explain the 
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numeric results in the three regimes: the pure random case, the antisymmetric region and the 
symmetric one. We finally end the paper with some comments and conclusions. 

2. Description of the problem 

Given anN xN matrix M = (dij) we are interested in the permutation a €Sn that minimises 
the total distance 

N 

This problem is usually named as the assignment problem or bipartite matching problem. 

The novelty of our approach is that rather than looking at the minimum distance itself we 
focus on the permutation a that gives this minimum. More concretely we are interested in the 
number of A:-cycles, p^, k= \ ,...,N va the permutation a (note that this numbers, determine 
the conjugacy class of a inside Sn). 

From this point of view we shall consider equivalent those matrices M whose minimum 
total distance corresponds to permutations in the same conjugacy class. This implies the 
following equivalence relation: 

i) (dij) ~ (adij + c), a,c e M, a > 

ii) (dij) ~ (d^i)^)) 7teS N 

iii) M = (dij) ~ M' = (dji). (4) 

In this paper M is a random matrix that depends on a constant A, we sometimes denote 
it by Mi, and it is constructed in the following way: take a random N x N matrix R = (R t j) 
whose entries are equally distributed, independent, real random variables with probability 
density p, then the entries of Mx = (dij) are given by 

dij — Rij + XRji. 

Note that, unlike the others, the diagonal elements depend on a single random variable and 
read da = (1 + A )/?,-;. Observe thatM^ is symmetric for A = 1, antisymmetric for A = —1 and 
purely random (without any correlation among its entries) for A = 0. 
From the definition of Mx we have 

1 , 
Mi/x = jM[, 

and, therefore Mx ~ M\ /x for A > and Mx <~ —M\ /\ for A < 0. 

As it was mentioned before we are interested in the number of fe-cycles or rather 
in its expected value in the distribution generated by R, we call it Pfc(A) = (pk)x- We shall 
consider A e [—1,1] that ranges from the antisymmetric matrix for A = — 1 to the symmetric 
one for A = 1 . On the other hand, given the previous equivalence (Mx ~ M\ /x for A > 0), 
the results with A e (0, 1] repeat themselves for 1 /A. Then in an effective way we cover the 
whole positive real line. For the negative part things are different as we have Mx ~ —M\n for 
A < 0; but, if the probability density for the entries of R is such that p(x)= p(c — x) for some 
constant c, then the distribution of the optimal permutation with A e [— 1 , 0) is again identical 
to the one for 1 /A . 

In the next sections we shall present the results for Pfc(A) and (n c )x, where n c = Y,kPk 
is the total number of cycles in the optimal permutation. It is interesting to observe how they 
change with A from the antisymmetric point, A = — 1 , to the symmetric one, A = 1 . Different 
values for the dimension N are considered to study the large Af limit. 
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We also vary the distribution p used to define the model. We mainly focus on the uniform 
distribution between [0, 1], with density p„, and on the exponential one, p e {x) = exp(— x)9(x). 
Note that p u (x) = p u (1 — x) and then, in this particular case, the interval [—1,1] for A is enough 
to cover the whole real line. On the other hand, as mentioned in the previous section, p e 
has been extensively used in studies of the assignment problem for random matrices [5 1,[ 16 1 
which motivates our choice. 

The two distributions considered in the previous paragraph have the same limit for the 
density in the minimum of its support p u (0) = p e (0) = 1. Many of the results obtained in 
the next sections hold independently of the distribution used to generate the random matrix 
provided its density function have a non zero limit in the minimum of its support. The same 
property is invoked in [5|,[ 16 1 to have a minimal distance with finite limit when N goes to 
infinity. 

3. Numerical results. 

We carried out a numerical simulation of the statistical ensemble described in the previous 
section. For that we generated between 10 5 and 10 6 random instances for Mx, using the 
corresponding probability distributions for the elements The number of instances depends 
on the dimension of the matrix, which ranges from N = A0toN = 1200. 

Once we generate the matrix we solve the assignment problem for it using the 
algorithm of R. Jonker and A. Volgenant [ 3 1 and compute the number of ^-cycles obtained 
in this way. In Fig. Q]we plot the value of (n c ) = J^P^; there one can see the phase transition 
between the two regimes of (n c ) for A < and A > 0. In the first case (A < 0) the expected 
value of n c behaves like log (AT) and is (almost) constant with A. For A > the values of (n c ) 
grow linearly with N and A l22l . 

To understand the behaviour of (n c ) in both regimes we analyse separately the average 
number of ^-cycles, P^, as a function of A and k. In the rest of the section we present the values 
obtained in the numerical simulation. In the following sections we shall give a theoretical 
explanation of these results. 

i) One cycles: In the second plot (Fig. O we show Pi as a function of A for different 
values of the dimension N. The dots correspond to p = p„ for dimensions 40, 200, 400, 800 
and 1200. The joined plots represent the results for p = p e with N = 40, 200 and 1200. We 
show no error bars because these are negligible. 

We observe that Pi vanishes in all cases in the left part of the diagram, it attains a common 
value Pi = 1 for A = and finally it takes a value that grows like \/N for A = 1. We finally 
note that the joined plots, corresponding to a different probability density p = p e , lay very 
close to their respective dots (for p = p„) and the fit gets better as N grows. 

The scaling of Pi with \/N is shown in the inset of Fig. [2] where we plot P\/^/N as a 
function of A for different values of N. 

ii) Two cycles: In the next plot (Fig. [3) we represent 2Pj versus A. As in the previous 
case we show it for different values of the dimension and different distributions: the dots 
correspond to p = p„ and the joined plots to p = p e . 

We again see that 2P2 vanishes near A = — 1, takes the value 2P2 = 1 for A = and 
grows, in an approximately linear way, in the symmetric region, A > 0, to a value close to N 
for A = 1 . We also observe that the points corresponding to p = p u fit very well with those of 
the joined plot corresponding to p = p e . The inset shows the linear scaling of P2 with N, for 
N > 200. 

Hi) Three cycles: The situation changes drastically when we plot 3P3 as a function of A 
in Fig. [4] The dots correspond toN = 40, 200 and 1200 for p — p u . The joined plot represents 
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Figure 1. Mean value of the number of cycles of the optimal solution for the 
assignment problem at different values of X and N. The dots and the joined plots 
are obtained with the distributions p = p„, and p = p e respectively. 

the case of dimension 1200 with p = p e . 

We see that 3P3 gets a constant value equal to 1 for almost all values of A and all values 
of N and p. Only near A = 1 things depend on N and as N grows the value of 3P3(A = 1) 
tends to 1 . This limiting behaviour is common for all probability densities p . 

Similar results are obtained for other odd cycles of small length compared to N i.e. 5P5 
or 7P-] are equal to 1 for all values of A except near A = 1, but it tends to 1 everywhere when 
N tends to infinity. 

iv) Four cycles. In the next plot (Fig. [5} we represent the behaviour of four cycles plotting 
4P4 versus A for different values of N and p . Dots represent the values obtained for different 
dimensions N = 40, 200 and 1200, all with the uniform distribution, with density p„. The 
joined plot corresponds to N = 1200 with p = p e . 

Comparing with the previous plot of P3, we see no change in the left part, A < 1 . However 
the right half is quite different. We observe that P4 always vanishes at the symmetric point, 
and it follows a smooth curve (even in the large N limit) from AP4 = 1 at A = to P4 = for 
A = 1. A similar result is obtained for other short cycles of even length like P$, P&, all of 
them vanish at A = 1, only the shape of the curve changes, it is more horizontal near A = 
and steeper as we approach the symmetric point. 

v) Intermediate cycles: In the Fig. [6] we show the cycles of intermediate length for 
dimension 200 and the density p„. As an example we draw kP^ for k = 50, 100 and 150. We 
see that, as in previous cases, the behaviour for A < is always constant and equal to 1 . For 
positive A we see a fast transition from 1 to at a value for A that diminishes as k increases. 
Other intermediate values of k and different values of N or p = p e give similar results (see 
also Fig. [l3]for odd values of k). 

vi) N-\cycles: In the Fig. |7]we draw (N-l)P N -\ fovN = 40, 200,400 and p =p u . We 
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Figure 2. Average number of one-cycles in the optimal solution for the assignment 
problem at different values of A and N. The dots are obtained with the uniform 
distribution with density p = p„, and the joined plots with p = p e . Statistical 
errors corresponding to three standard deviation are not visible. The inset shows 
the behaviour of P\ /VN as a function of A for different values of N. 



see a peak, sharper as N increases while its maximum moves toward A = 0. It always takes 
the unit value at A = 0. As before, different distributions give similar results. This plot, as 
well as those of (N — 2)fV_2 and (N — 3)/V-3 which are plotted in the Fig. |9]are qualitatively 
very different from the previous ones and also different from each other. In section 5 we shall 
introduce a simple ansatz that accounts for this, with great accuracy. 

vii) N cycles: Finally in the Fig. 8 we present the results for NPn for different 
dimensions. Note that it is again constant near A = — 1 but, contrary to the previous cases, the 
constant is not 1 but rather e 3 / 2 = 4.4816.... It takes the value 1 for A = and vanishes for 
A > 0. The width of the transition is inverse proportional to N. Different distributions give 
similar results. 

To summarise the results of this section we have that for small cycles, with odd k > 2, 
kPk — 1 for all A in the large limit. Small cycles with even k > 2 have a smooth decay to 
at A = 1 . For cycles of intermediate length kPk ~ 1 from A = — 1 until it has an abrupt decay 
at a positive value of A that depends on k. Cycles of length close to have a very different 
behaviour one from each other. And finally, one and two cycles are absent for A < and grow 
like and respectively for A > 1 . 
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Figure 3. Average number of two-cycles (multiplied by 2) in the optimal solution for 
the assignment problem at different values of X and N. The results obtained with the 
densities p u (p e ) are displayed as points (joined plot) respectively. Statistical errors 
are negligible. The inset corresponds to ft versus A for different values of N. 



4. Solution of the model for A = 0. 

We start with the theoretical study of the model by analysing the point A = 0. In this case 
Mo = R and the entries of our matrix are identical, independent random variables. Due to 
this fact we can show that all permutations a have the same probability of giving rise to the 
minimal distance. 

The proof is very simple. Given Mo = (dy) call 

p(M ) = Hp(dij), 

the probability distribution in the space of matrices for A — 0. It is then clear that 

for any permutation K 6 SV- But if a is the permutation that minimises the distance D a for 
((dij)) then do n gives the minimum distance for (d n ^j). It implies then that a and a on have 
the same probability of being the optimal permutation, which leads to the uniform distribution 
in Sn 

Once we have established that at A = all permutations have the same probability our 
problem is a purely combinatorial one, and reduces to compute how many ^-cycles there are 
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Figure 4. Average number of 3-cycles (multiplied by 3) in the optimal solution for 
the assignment problem. Symbols correspond to p„ and different values of N and 
the joint plot is for p e and N = 1200. The error bars correspond to three standard 
deviations from the mean. 
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Figure 5. Mean value of 4-cycles (multiplied by 4) in the optimal solution for the 
assignment problem. Symbols correspond to p u and different values of N and the 
joint plot is for p e and N = 1200. Error bars represent three standard deviations. 
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Figure 6. Average number of ^-cycles (multiplied by k) in the optimal solution for the 
assignment problem at different values of k and X for N = 200. Error bars represent 
three standard deviations. 
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Figure 7. Average number of (N — l)-cycles (times N — 1) in the optimal solution for 
the assignment problem at different values of A and N. Error bars represent three 
standard deviations. Note the common value (N— 1)Pjv_i = 1 at X = for all values 
of N. 
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Figure 8. Average number of A'-cycles (times N) in the optimal solution for the 
assignment problem at different values of X and N. Error bars represent three 
standard deviations. 



in Sn- This number, that we call Viv(fe), is well known to be 

AH 

vv(*) = y 

as one can derive from simple counting arguments, i. e. Va? (k) — [ ) (k — 1 ) ! (N — k) ! where 



the different factors count repectively the possible choices of k indexes to form the cycle, their 
orderings and the permutations of the rest of indexes. Note that in this way every permutation 
is counted as many times as the number of ^-cycles it contains, hence the result follows, d. 

For latter purposes we shall present here a different, more cumbersome, way to derive 
Vjv(fc) that makes use of the generating function |23|,|25]. Let 

G(*) =£-** = log (- 

, m V 1 —x 

m=l \ 

be the generating function for the number of ^-cycles in in the sense that 

d k 



dx k 



G{x) = {k-\)\. 
=o 



§ We can also use the following iteration v N (k) = (N — k + 8^) v N - i(k) + (k— 1 ) v N - 1 (k— 1). The first term in 
the iteration counts the number of ^-cycles that persist when one add a new index while the second term stands for 
the number of ways one can add a new index to a k— 1-cycle to make it one unit larger. The is there because 
for one-cycles, when adding a new index linked to itself rather than to any of the preexisting ones, the number of 
one-cycles is increased by one 
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But we rather want to compute the number of fc-cycles in S^. To do this we observe that the 
generator for the permutations in Sn are obtained by simply taking the exponential 

1 



l-x 



The procedure to obtain the number of fc-cycles in Sn is then simple. We introduce 



G a (x)=x + -x 2 + --- + — 



a k 



1 —x k+l +--- 



k+\ 



so that when we take the exponential of G a the power of a in every term indicates the number 
of fc-cycles that the corresponding permutation contains. Therefeore, Vp/(k) is given by 



~ dx N 



a=l 



dx N 



e G a 



k 1 



m 
T' 



(5) 



The expected number of fe-cycles for X — is then 

v N (k) _ 1 



P*(A=0): 



Note that this result is independent of N and of the probability density p we used to generate 
the ensemble. This explains why in all the results showed in the previous section kP\ = 1 for 
A = 0. Finally, the expected value of n c is: 

N 

(n c ) l=Q =Y,Pk(^=0)=H N , (6) 

k=i 

where Hm is the Harmonic series 



5. The antisymmetric region, X < 0. 

In this section we study the behaviour of P% (X ) for X < 0. We start by the observation that one- 
cycles and two-cycles are strongly suppressed for X = — 1. The absence of one and two-cycles 
in the solution of the AP makes it equivalent to the corresponding 1FP as it was mentioned in 
the introduction. 

This fact can be heuristically understood if one considers that the optimal permutation for 
M comes from the choice of N elementary distances dy out of and, apart from the diagonal 
elements which are 0, the rest of elements are half of them negative and half of them positive. 
Then, for large N, the shortest total distance will be typically obtained when we chose only 
negative elements and this excludes the possibility of having one-cycles (da = 0) and two- 
cycles (djj — —dji) that always include non negative entries. The rest of cycles have no 
correlation among their elements and therefore it is reasonable to assume the equiprobability 
of all permutations that do not contain one-cycles or two-cycles. With this assumption we 
reduce the problem to a combinatorial one and we can proceed like in section 3. 

Our goal, however, is to understand the expected number of A:-cycles in the whole 
negative region X G [—1,0] that interpolates between the absence of one and two cycles 
for X = — 1 to the expected values Pi(0) = 1 and Pi{0) = 1/2 at X = 0. This goal can 
be achieved with the following ansatz. We assume that, at least in the large N limit, the 
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probability for a permutation to be the shortest distance depends only on the number of one- 
cycles and two-cycles it contains. This is consistent with the fact that only one and two cycles 
are sensible to the symmetry of the matrix, bonds of longer cycles are uncorrected. Namely 
for a permutation with p\ one-cycles and p 2 two-cycles the probability is proportional to 
q'^q^ 2 , where q\ and q2 vanish for A = — 1 and q\ = q 2 = 1 for A = 0. The new generating 
function is then: 



- k=3 v 



+ ( qi -\)x+(q 2 -\)x 2 /2. 



That implements the idea outlined above, as in the exponential of G 9li?2 (x) every term has a 
weight q p ^q p 2 ■ From this we derive the normalising factor (the total weight of the space of 
permutations) 

d N e (q { -l)x+(q 2 -\)x 2 /2 

~d^ 



d N 



x=Q 



x=0 



1 



(7) 



while the expected value for the number of A:-cycles can be obtained as in previous section by 
introducing the factor a multiplying x k and taking the derivative of the exponential at a = 1 . 
The result for k > 2 is 



kdx h 



x=0 



^ & (q { -\)x+(q 2 -\)x 1 l2 



for/t>2. 



(8) 



To compute these quantities we use the singularity analysis approximation 112411 . In the case 
at hand the N th coefficient in the power series is approximated by the residue at the pole in 
Z = 1 . It then gives 

a qug2 (N) =Nl ^n+ q 2/2-3/2 + ^ {lqi _ j \N /m + \ q2 / 2 _ 1/21^/(^/2)!)) , 

,N - k /(N-k)\ + 



and 



£l quq2 (N)P k = ^l( e «+^/2-3/2 + 



-1 



\q2/2-l/2\ N '*-V 2 /(N/2-k/2)i$) 



For small values of q\,q 2 and k (compared with N) this approximation can be used and we 
obtain P% w I for k > 2, which is compatible with the numerical results of section 3, (see 
figures 4, 5 and 6). 

Pi and P2 do not follow the general formula but 

d N 



Pi =a 



- quq ^ N) 1 dx N 



qixe, Gc n-n { 



(10) 



which, in the singularity analysis approximation, gives 

Px=qi + ff{\qi-l\ N - l l{N-l)\ + \q 2 /2-l/2\ N l 2 - l l 2 l{Nl2-l/2)\).(U) 



And 



so that 



P 2 = n quq2 ( N )-t— 



=0 



2 



(12) 



P 2 = 



■0(\q l ~l\( N -V/(N-2y. + \q 2 /2-l/2\ N / 2 - 1 /(N/2-l)!).(U) 
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Then for small values of q\ and q 2 and the values of N we are considering in the paper (from 
40 to 1200) we can take P\= q\ and P 2 = q 2 j2 with a very good accuracy (that covers the 
A < region since there q\ and q 2 are less than 1). 

For long cycles N the singularity analysis approximation is not valid any more. In 
this case, however, it is very easy to compute (JHJ explicitly. Therefore with the precision given 
by that of £l qi m (N) we get: 

NP N -eV 2 -^-^ 2 
(iV-l)/V-i^?ie 3/2_9l ^ 2/2 
(N-2)P N _ 2 ~ (q 2 /2 + q 2 /2)^ 2 -"^ 2 

(N-3)P N ^~(l/3 + qi q 2 /2 + ql/6)ey 2 -^ 2 (14) 




-0.1 0.1 -0.1 0.1 

A A 



Figure 9. Average number of kP^ (for the largest values of k) in the optimal solution 
for the assignment problem at different values of A and for N = 200. The points are 
the result of our simulation and the error bars represent three standard deviations 
from the mean. The joined plot is the theoretical prediction using ( I14t . 

In Fig. [9] we plot kPk for k — N, ■ ■ ■ ,N — 3 and N = 200. The continuous line is the 
theoretical value obtained from ([Pil l where we take q\ — P\ and q 2 — 2P 2 . One can see that 
the agreement is excellent. A similar match holds for the other cases. 

Thus, from the previous expressions we see that the behaviour of for for k = 1, ,iV 

for A < is completely determined by q\ and q 2 . In the rest of the section we shall study 
the behaviour with A and Af of this two factors. Many of the results presented below are 
independent on the distribution used to generate the random matrices, provided the probability 
density fulfils the non vanishing property in the minimum of its support that was discussed in 
section 2. In the rest of the paper we shall assume that this property holds. 

Our first observation is the relation between q\ and q 2 for the same value of A^,A and 
p. One can check that qi = q\. A plot showing the extremely good fit between the two 
values as a function of A < for Af = 200 and different p is shown in Fig. [10] This relation 
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can be expressed as the fact that the probability of a permutation to produce the minimal 
total distance, is unchanged if we change the permutation by substituting a two-cycle by 
two one-cycles. An argument for this comes from the fact that given two indexes i and j, 
djj + djj = (l+X)(Rjj+Rji) whi\edii + djj — (1 + X)(Ra+Rjj). Then both sums are identical 
random variables. 




The second important property we observe in the region A < is the invariance under 
scaling of A and N (see Fig. fTTT) . In fact one can check that for a given probability density p, 
q\{X,N) = qidxXjU^^N). And as any P^ can be obtained from q\ according to the formulae 
above, this scale invariance is true also for any Pf,. 

The scaling relations presented in the previous paragraph are obtained by taking a fixed 
probability density p to generate the ensemble, while we change A and Af. We want to examine 
now how qi depends on the distribution near the random point A = 0. Given the result that 
we can rescale A and N without changing q\ it is natural to think that q\ can be determined 
by looking at only a few elements of the matrix Mx ■ A confirmation of this conjecture is not 
available yet, but some partial results can be verified. Concretely we can reproduce the slope 
of qi at A = 0, that depends on the distribution, by the following formula: 

-z^- (A =0,N) = aN. 

OA 

Where a depends solely on the distribution and is determined as follows: 

for a given value of lambda fix i ^ j and define §g = min(</y,dw), also define £o = 

tain(du)djj). Now compute 

0(A) = I(0(6,-&)) X , 
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Figure 11. The points in the upper curve represent the values of q\ as a function 
of XN for different values of X and N and for matrices generated with probability 
density p„. The tangent line at X = is the theoretical prediction given by l !15t . The 
lower curve is the same but for matrices generated with the exponential density p e . 



where with 9 we denote the Heaviside step function. The coefficient a is obtained by 

« = -^©U=o- 

As we mentioned before the value of a depends only on the probability density p and can be 
computed with the following formula 

a = 2 £Vto f P(y) f \z-x)p(z)dzdydx. (15) 

The meaning of © is the following: it measures the probability for an extra diagonal element of 
a pair to be smaller than its pair and than two entries in the diagonal. It, somehow, reproduces 
at a small scale (only four random variables involved) the mechanism for the disappearance 
of one-cycles (diagonal entries) in the real problem as A starts to be negative. Recall that the 
argument for the disappearance of one and two-cycles was based in the fact that for negative 
A one of every pair of extra diagonal terms is smaller (in average) than the diagonal terms 
(or than half the sum of the extra diagonals). It then implies that the appearence of one and 
two-cycles in the optimal permutation is disfavoured. This property is quantitatively studied 
by means of the function ©. 

Our result has been checked with different distributions and the agreement is very good. 
As an example we show in Fig. QT|the lines for p e and p u with slope 1/2 and 1 /4 respectively, 
as obtained from ( TT3T ). We can see that these lines are, as predicted, tangent to the curve of Pi 
at A = 0. 
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6. The symmetric region A > 

As shown in Fig. |2]and[3j the first relevant fact in this region is that Pi and P2 grow from 
1 and 1/2 respectively for A = 0, to values proportional to ^/N in the first case and to 
N in the second for A = 1. A first attempt to account for this behaviour is to adjust the 
corresponding parameters q\ and q^ to fulfil equations (ITOb and ( fT2l , (note that now q\ and q^ 
can be > 1 so the terms of order (q\ - l)^- 2 ) /(N -2)! and (q 2 /2- \/2) N ! 2 - 1 /{N/2- 1)! 
can be important). The values of q\ and qi obtained in this way are used to compute for 
different values of A. 




0.5 1 



X 

Figure 12. Numerical value of 3P? (dots) and the theoretical prediction using 
equations j 10b and J 1 2b with the corrected values of q\ and 42 (continuous line) 
and without the corrections (discontinuous line). 

This procedure, however, fails to predict the numerical results in two different aspects. 
First, if we try to fit P3 we obtain a large deviation with respect to the numerical value near 
the symmetric point. This is shown in Fig. [T2]where the dots represent the numerical value 
and the dashed line represents the theoretical prediction obtained as outlined above. Also the 
disappearance of even cycles at A = 1, as shown in fig. [5] is not taken into account within 
this approximation i. e. the theoretical value for P4 does not vanish at A = 1 . These two facts 
happen to be connected and will be discussed in the next paragraph. 

Is is first important to understand why even cycles disappear when A = 1 . The reason 
is very simple, for if we had a cycle of even length i. e. o(i m ) = i m +i,m — 1, . . . ,2L + 1, 
with i m 7^ i m i except z'j = /2L+I1 then either the links in odd position di 2l { j 2l or those in even 
position di 2l i 2l+l have a smaller sum. Assume that 

L L 
52^<2/'2/+l < H^'2;-1'2/' 

/=1 /=1 

then the new permutation a 1 which is equal to a except for o'(i2i+ ]) = hh l = l,..-,L gives 



On the number of fc-cycles in the assignment problem for random matrices 



17 



a smaller total distance. To see this, it is enough to realise that, given that M\ is a symmetric 
matrix, the sum of the odd links for a is replaced by that of the even links in o' which lowers 
the total distance. Hence it is impossible to have cycles of even length larger than two, in the 
optimal permutation of a symmetric distance matrix. 

The mechanism for disappearance of even cycles we outlined in previous paragraph can 
be stated by saying that 2L-cycles break into L two-cycles. This is the key point behind the 
improvement of the approximation in order to account for small cycles. The idea is that in 
equations ( fTob and dT2T) instead of using the value of Pi obtained in the numerical simulations 
we subtract to it the two-cycles that come from what would be cycles of even length. The 
procedure is then clear: we start with a value for q\ and qi, say P\ and 2Pi, we compute with 
this values the theoretical number of cycles of even length and subtract from it the real one 
obtained in the numerical simulations. These are the cycles that break into a number of two 
cycles. We subtract this number from Pi, introduce the new value of P2 into equation (fT~2b and 
compute again q\ and qi. The procedure is iterated until the desired convergence is reached. 
In practise in 4 or 5 iterations we obtain a very good precision. 




.5 



Figure 13. Values of kP^ for intermediate values of k, N = 200 and A > 0. The 
continuous line is the theoretical prediction. 

In Fig. [12] we plot the numerical values for 3P3 (dots) and the theoretical curves using 
the uncorrected version for q\ , qi (dashed line) and the corrected ones (solid line). We see that 
the fit is much better in the second instance. The theoretical prediction can be also applied 
to the intermediate cycles as shown in Fig. [13] The theoretical and numerical values for kP^ 
with N = 200 using the corrected q\ and q%, show a very good agreement. 

Our last point is the relation between q\ and qi that extends for positive values of A the 
fit shown in fig. [10] We find that the dependence changes in this case. A very good fit is 
obtained by taking 

qi = z x q\{q\ -X) = F(qi). 

As it is shown in fig. [I4]the agreement is rather good and it gets better in the large N limit. 
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Figure 14. Values of qi versus F(q\) = e^q\(qi — A) for positive A. The plot 
includes the points obtained for N = 40, 200 and with the probability density p = p„ 
andp = p e . 



7. Conclusions and outlook. 

The expected number of fc-cycles in the optimal permutation of the assignment problem for 
random matrices, can be understood to great accuracy in terms of only two parameters, q\ 
and qi associated to one and two-cycles. More precisely, the ansatz is that in the large N limit 
the probability for a permutation to be the solution of the AP is proportional to q^q^ 2 , with 
P\,P2 the number of one and two-cycles of the permutation respectively. The ansatz can be 
substantiated by considering that with the cost or distance matrices used in the paper only 
one and two-cycles are sensible to the symmetry of the matrix, as bonds of longer cycles are 
uncorrelated. On the other hand in the large N limit we can consider the occurrence of short 
cycles as independent events. 

With this ansatz we are able to explain, with great accuracy, the expected number of 
^-cycles in the solution of the AP for cost matrices ranging from the symmetric to the 
antisymmetric one. The parameters suffer an abrupt transition (in the large N limit) when 
moving from a matrix mostly symmetric (X > 0) to another one mostly antisymmetric (X < 0). 

We also find some universal scaling relations in the variables which are valid in the 
antisymmetric region. Based in this scaling behaviour we are able to give a theoretical 
prediction for the slope of q\ at the critical point, X = 0. 

An open problem is to understand the behaviour of the cycles of even length in the 
symmetric region. It is clear that, as it is argued in the paper, all of them (except the two 
cycles) should be absent at the symmetric point (X = 1), but for the moment we do not know 
how to explain the curves that the average number of even cycles follow to reach the zero 
value. Finally, it would be nice to have a full theoretical study of the model (or a reliable 
approximation to it) that could explain the facts mentioned above. 
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